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Abstract 

Deadlock detection scheduling is an important, yet often overlooked problem that can sig- 
nificantly affect the overall performance of deadlock handling. Excessive initiation of deadlock 
detection increases overall message usage, resulting in degraded system performance in the ab- 
sence of deadlocks; while insufficient initiation of deadlock detection increases the deadlock 
persistence time, resulting in an increased deadlock resolution cost in the presence of deadlocks. 
The investigation of this performance tradeoff, however, is missing in the literature. This pa- 
per studies the impact of deadlock detection scheduling on the overall performance of deadlock 
handling. In particular, we show that there exists an optimal deadlock detection frequency 
that yields the minimum long-run mean average cost, which is determined by the message com- 
plexities of the deadlock detection and resolution algorithms being used, as well as the rate of 
deadlock formation, denoted as A. For the best known deadlock detection and resolution algo- 
rithms, we show that the asymptotically optimal frequency of deadlock detection scheduling that 
minimizes the overall message overhead is ©((An) 1 / 3 ), when the total number n of processes 
is sufficiently large. Furthermore, we show that in general fully distributed (uncoordinated) 
deadlock detection scheduling cannot be performed as efficiently as centralized (coordinated) 
deadlock detection scheduling. 



Keywords/Index Terms: Deadlock detection scheduling, Deadlock formation rate, Deadlock 
persistence time 



1 Introduction 



The distributed deadlock problem [8j [20l [161 I2S1 Ell E] arises from resource contention introduced 

by concurrent processes in distributed computational environments. It has received a great deal of 

x The material in this paper was presented in part at the Twenty-Fourth Annual ACM SIGACT-SIGOPS Sympo- 
sium on Principles of Distributed Computing, Las Vegas, Nevada, July 17-20, 2005 
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attention in different areas such as distributed computing theory [22j [26j E] > distributed database 
[13 [lOl E] , and parallel and distributed simulation [2j[28l[2T]. A deadlock is a persistent and 
circular-wait condition, where each process involved in a deadlock waits indefinitely for resources 
held by other processes while holding resources needed by others. As a result, none of the processes 
waiting for needed resources can continue computation any further without obtaining the waited- 
for resources. A deadlock has an adverse performance effect that offsets the advantages of resource 
sharing and processing concurrency. 

There are three common strategies of dealing with the deadlock problem: deadlock prevention, 
deadlock avoidance, and deadlock detection and resolution. It is a long-held consensus that both 
deadlock prevention and deadlock avoidance strategies are conservative and less feasible in handling 
the deadlock problem in general, whereas the deadlock detection/resolution strategy is widely 
accepted as an optimistic and feasible solution to the deadlock problem, because of its exclusion of 
the unrealistic assumption about resource allocation requirements of the processes [TO ] [To ] [26 ] [7] [27] . 
The central idea behind the deadlock detection and resolution strategy is that it does not preclude 
the possibility of deadlock occurring but leaves the burden of minimizing the adverse impact of 
deadlock to deadlock detection and resolution mechanisms. Under this scheme, the presence of 
deadlocks is detected by a periodic initiation of a deadlock detection algorithm and then resolved 
by a deadlock resolution algorithm [3TJ [27J [7] . 

Despite significant performance improvement in the past, deadlock detection remains a costly 
operation |26U11[ [T9]. It requires dynamical maintenance of wait-for-graph (WFG) that reflects the 
runtime wait-for dependency among distributed processes, and performs a graph analysis to detect 
the presence of deadlocks. There is a substantial tradeoff between the cost of deadlock detection 
and that of deadlock resolution |26[ [To] [23] . An initiation of deadlock detection consumes runtime 
system and network resources which are basically pure overheads when no deadlock is present 
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[55} fl9] • Excessive initiation of deadlock detection would reduce the deadlock resolution cost but 
result in system performance degradation in the absence of deadlock, while infrequent deadlock 
detection would be accompanied by the increased deadlock size, resulting in an increased deadlock 
resolution cost in the presence of deadlocks \23\ [To] \15\ [JJ. It is evident that deadlock detection 
scheduling is one of key factors affecting the overall system performance of deadlock handling. 
Nevertheless, to the best of our knowledge, this subject is generally missing in the literature. 

This paper investigates the optimal deadlock detection scheduling. We study how to best sched- 
ule deadlock detections so as to minimize the long-run mean average cost of deadlock handling. We 
formulate this problem by introducing a generic cost model (utility metric) and use this cost model 
to establish a connection between deadlock detection and deadlock resolution costs, in relation to 
the rate of deadlock formation. We show that there exists a unique optimal deadlock detection 
frequency that yields the minimum long-run mean average cost. Moreover, our result indicates that 
the asymptotically optimal frequency of deadlock detection that minimizes the message overhead 
is ©((An) 1 / 3 ), when the number n of processes in the system is sufficiently large. In addition, 
we prove that a fully distributed (uncoordinated) detection scheduling can not be performed as 
efficiently as its centralized counterpart (coordinate scheduling). 

The rest of this paper is organized as follows. Section 2 contains a brief summary of the dis- 
tributed deadlock detection and resolution algorithms. Section 3 gives the notions and definitions. 
Section 4 provides the detailed mathematical analysis and proves the existence and uniqueness of 
an optimal detection frequency. The determination of the optimal deadlock detection frequency, 
its asymptotic relation with the number of processes in a distributed system, and the impact of 
random detection scheduling upon the long-run mean average cost of deadlock handling, are pre- 
sented. In Section 5, the main contribution of this paper is highlighted and the possible future 
work is discussed. 
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2 Background 

In this section we provide a brief summary of worst-case analysis of existing distributed detection 
algorithms of generalized deadlocks and deadlock resolution algorithms since some results will be 
used later on. We also touch on Gray's simulation model [8] as well as Massey's formulation |20| . 

We restrict our discussion to distributed detection and resolution algorithms. The references 
[TU[ [T2l [T31 [TH [J31 [IB] provide excellent gateways to the state of the art in this area for the 
generalized resource request model. In the following, we give a brief summary of the worst-case 
performance of the existing distributed detection algorithms. 
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Table 1: Distributed Deadlock Detection Algorithms 



Table 1 summarizes the worst-case complexities of distributed deadlock detection algorithms 
l30[ [T2"l I14j . where n is the total number of processes, e the number of edges, d the diameter, 
and I the number of sink nodes of the WFG. The distributed detection algorithm for generalized 
deadlocks by Kshemkalyni and Singhal [14J is the clear winner among the algorithms listed in 
Table 1. Their algorithm has achieved a message complexity of 2e and a time complexity of 2d, 
which are believed to be optimal. Since e = n(n — 1) and d = n in the worst-case analysis, the 
worst-case message complexity and time complexity thus can be written as 2n 2 and 2n, respectively. 

Although deadlock detection and deadlock resolution are often discussed separately, the latter 
is as important as the former [TOj [261 Ell 13 123 E2J [16]. The primary issue of deadlock resolution 
[15l [i~6[ [T7] is to selectively abort a subset of processes involved in the deadlock so as to minimize 
the overall abortion cost [19 1 126 1 [27 1 17]. This is often referred to as the minimum abort set problem. 
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These victim (aborted) processes must cancel all pending requests and release all the acquired 
resources in order to avoid false deadlock detection and resolution [261 H21 E] - The abortion cost 
thus includes (1) the sending of cancel messages to those resources, and (2) the sending of reply 
messages to all the waiting processes that are currently being blocked for the resources held by the 
aborted processes. One noteworthy point is that these waiting processes could be either transitively 
blocked or deadlocked processes. To further reduce the abortion cost, checkpointing is sometimes 
introduced to prevent the victim processes from being rolled back from scratch |18j . 
In addition, it is possible that more than two processes can independently detect the same deadlock. 
If each process that detects a deadlock resolves it, then the deadlock resolution will be highly 
inefficient and will result in subsequent false deadlock detection and deadlock resolution |26|. [7J 
[T3| 115] , Therefore, only one process should be selected for resolving a deadlock, which in turn 
requires that the initiations of deadlock resolution algorithm in different sites be coordinated. Such 
a coordination for safe deadlock resolution comes at an additional communication cost in message 
exchange [7J. 

Generally, deadlock resolution cost is measured either in terms of time complexity [6j \T7\ [27] , or in 
terms of message complexity |15[ [161 E] • The complexity of resolution algorithms is summarized in 
Table 2, where n is the total number of processes, m the number of processes having the priorities 
greater than deadlocked processes, N r the number of resources, and no the size of a deadlock. Note 
that the message complexities are not given in |17[ [27] . 
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Table 2: Distributed Deadlock Resolution Algorithms 



By transforming the problem of deadlock resolution into a minimum vertex cut problem, Lin 
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Sz Chen's algorithm [5\ can identify an optimal set of victim processes to be aborted, with the 
properly selected abortion cost to avoid the starvation and livelock problems. The main feature 
of Terekhov & Camp's algorithm is to take the number of resources into account. The deadlock 
resolution algorithm proposed by Mendivil et al. [7] uses a probe-based approach, with a focus on 
the safety aspect of deadlock resolution. The novelty of this algorithm is to use an additional round 
of message exchanges to gather the information needed for efficient resolution after deadlocks are 
detected. The algorithm uses special message known as probes to travel in the opposite direction 
of the edges in AWFG (asynchronous wait-for graph), and then chooses the lowest priority process 
of each detected cycle as a victim process to be aborted, hence avoiding the livelock and starvation 
problems. This deadlock resolution algorithm [7] excels in the use of formal methods to prove 
the correctness and in its fine-granular analysis of the algorithm complexities. In particular, its 
message complexity is of 0(mn 2 D ). The worst-case message complexity can also be written as 
0(n 3 ) because the eventual deadlock size, no, is bounded by the total number of processes in the 
distributed system, that is, m = 0(n) and no = 0(n). 

The past research has been primarily aimed at minimizing the complexities (costs) of the dead- 
lock detection and resolution algorithms. Although deadlock detection scheduling (particularly how 
frequently deadlock detection should be performed) has significant impact on the overall perfor- 
mance of deadlock handling in practice, it is not explicitly studied but rather implicitly reflected 
in the description of deadlock detection algorithms, without a clear guideline. For instance, in 
[TUl [26l HU HH [191 [H [TO! G>] , the authors stated that a deadlock detection is initiated when a dead- 
lock is suspected. Other works |23|. [TT] suggested that it would be highly inefficient if deadlock 
detection is performed whenever a process/transaction becomes blocked. 

The performance of deadlock handling not only depends on the per-detection cost of the dead- 
lock detection algorithm, but also on how frequently the deadlock detection algorithm is executed 
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[TTl [23l [19]. The choice of deadlock detection frequency presents a tradeoff between deadlock de- 
tection cost and deadlock resolution cost [10], [26l [231 [13 E] • Park e£ ol. [23] pointed out that the 
reduction of deadlock resolution cost can be achieved at the expense of deadlock detection cost. 
Krivokapie et al. [11] showed in their simulation study that the path-pushing algorithm (one type 
of deadlock detection algorithm) is highly sensitive to the frequency of deadlock detection. Gray 
et al. [5] showed that the probability of a transaction waiting for a lock request is rare. They used 
a "straw-man analysis" in their simulation model that agreed well with the observation on several 
data management systems. Massey [20] formulated a probabilistic model that gave an analytic jus- 
tification for the simulation results reported in [8], showing that the probability of deadlock grows 
linearly with respect to the number of transactions and grows in the fourth power of the average 
number of resources required by transactions. 

To our best knowledge, only a few papers [8] [161 EH El EHl EHl E] mentioned about deadlock 
detection scheduling but under a different context from this paper. The idea of relating deadlock 
recovery cost to deadlock persistence time, and identifying an optimal deadlock detection frequency 
that minimizes the long-run mean average cost from the perspective of deadlock handling, has not 
been considered before. 

3 Deadlock Persistence time and Deadlock Recovery Cost 

In this section, we first give the following definitions in order to simplify problem formulation. 

Definition 1 A deadlock refers to a circular-wait condition where a set of processes waits indefi- 
nitely for resource from each other. A blocked process (a process in a deadlock) refers to the process 
that waits indefinitely on other processes to progress. Deadlock size refers to the total number of 
blocked processes involved in the deadlock. 
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Blocked processes can be decomposed into two categories: deadlocked and transitively blocked pro- 
cesses [16]. Deadlocked processes belong to a cycle in the WFG, while a transitively blocked process 
refers to one that waits for the resources held by other processes but does not belong to any cycle 
in the WFG. 

Definition 2 Two deadlocks are said to be independent of each other if they don't share any dead- 
locked process. 

The independence of deadlock occurrence can be justified by the wide acceptance of large-scale 
distributed systems and adoption of fine-granularity locking mechanism such as semantic locking 
[241 [TT] and record- granularity locking [24] , After decades of research and development, large- 
scale distributed systems allow resource sharing among hundreds or even thousands of sites across 
a network |24| 111] . The fine-granular locking mechanisms enable a higher degree of parallelism. 
Large-scale resource distribution and fine-granularity of locking make deadlocks likely to form 
independently. 

Now we are in a position to introduce the notion of deadlock persistence time which serves as a 
basis for our problem formulation. Let S = {Si, S2, • • • } be the time instants at which independent 
deadlocks initially occur, i.e., the ith deadlock forms at time Si. 

Definition 3 The persistence time of the ith deadlock with respect to time t, denoted by t p (t,Si), 
is 



The function t p (t, Si) represents the time interval between the present time and the time at which 
the deadlock is initially formed. It grows linearly until the deadlock is resolved. The notion of 
deadlock persistence time in spirit is similar to that of deadlock latency or deadlock duration in 
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Once a deadlock is formed, other processes requesting resources currently held by the blocked 
processes in the deadlock (including deadlocked and transitively blocked processes) will be blocked 
forever unless the deadlock is resolved. As a result, each deadlock acts as an attractor to trap more 
processes into it. As the deadlock persistence time increases, the size of the deadlock (the total 
number of processes involved in the deadlock) keeps growing |26[ [TBI IS] > which in turn increases 
the deadlock resolution cost. 



This dependency of deadlock resolution cost upon deadlock persistence time can be illustrated in the 
example in Fig(HJ). At time=l, there are three circularly deadlocked processes and two transitively 
blocked processes. At time=2, there are seven circularly deadlocked processes. The graphs (a) 
and (b) in Fig([T]) represent two snapshots in the wait-for graph, showing that the deadlock size 
(including both deadlocked and transitively-blocked processes) grows with the deadlock persistence 
time. Intuitively, a deadlock resolution algorithm will have to explore the entire deadlock in order 
to identify the least costly set of victim processes to be aborted. The intrinsic dependency of 
deadlock size (and thus deadlock resolution cost) upon deadlock persistence time was observed by 
Singhal et al. [26j 031 [29] , Lee pH Q2] , Krivokapic et al. [II], Lin et al. [J7], and Park et al. [23]. 



X 




(a) time = 1 



(b) time = 2 



Figure 1: Increasing Deadlock Size with Deadlock Persistence Time 
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Throughout this paper, we use n to denote the total number of processes in a distributed system 
and rt£>(.) to denote the size of a deadlock. Consider an arbitrary deadlock. Its size is a function of 
deadlock persistence time t p , denoted as riD{t p ). The deadlock size noitp) by nature is a discrete 
staircase function that jumps by one whenever a new process becomes transitively blocked by the 
deadlocked processes. To facilitate our mathematical analysis, we will treat nn(t p ) instead as a 
continuous, increasing function, which is an approximation of the staircase one. 

The deadlock size function ri£){t p ) has the following mathematical properties. (1) n/)(0) = 0, (2) 
monotonicity: n' D (t p ) > 0, t p > 0, and (3) bounded: njj (oo) < n, where n' D (t p ) is the derivative of 
n D(tp)- The first property refers to the initial deadlock size at t p = is zero. The second property 
reflects the fact that the number of blocked processes in the deadlock increases monotonically with 
deadlock persistence time t p , and the third property indicates that the eventual deadlock size is 
bounded by the total number of distributed processes. For the sake of easy presentation, we drop 
the subscript p hereafter. 

Now let's revisit the message complexity achieved by the deadlock resolution algorithm proposed 
by Mendivil et al. [7J, which is 0(mn 2 D ) = 0{nn 2 D ), where m is the number of deadlocked processes 
having priority values greater than those of the deadlocked processes. Notice that the deadlock 
size, no, is a function of deadlock persistence time. To make this dependency concrete, the message 
overhead can be written as cnn 2 D (t) for some constant c. This result will be used later to derive 
the optimal frequency of deadlock detection scheduling. 

4 Mathematical Formulation 

In this section, we begin with a generic cost model that accounts for both deadlock detection and 
deadlock resolution, which is independent of deadlock detection/resolution algorithms being used. 
We then prove the existence and the uniqueness of an optimal deadlock detection frequency that 



10 



minimizes the long-run mean average cost in terms of the message complexities of the best known 
deadlock detection/resolution algorithms. 

In this paper we choose the message complexity as the performance metric for measuring the 
detection/resolution cost. The reason for choosing message complexity is that communication 
overhead is generally a dominant factor that affects the overall system performance in a distributed 
system [2UJ [TUJ [THJ, [Hj, as compared with processing speed and storage space. Note that the 
worst-case message complexity can normally be expressed as a polynomial of n. Per deadlock 
detection cost is denoted as Cd- The resolution cost for a deadlock is denoted as Cfj(i), which is a 
function of the deadlock persistence time t. In general, the resolution cost is a polynomial of noif)- 
For example, the deadlock resolution cost for Mendivil's algorithm [7] is cnn 2 D {t). Because no(t) 
is a monotonically increasing function of deadlock persistence time. Cn(t) is also monotonically 
increasing with deadlock persistence time. We assume that deadlock formation follows a Poisson 
process for two reasons: First, the Poisson process is widely used to approximate a sequence of 
events that occur randomly and independently. Second, it is due to mathematical tractability of 
the Poisson process, which allows us to characterize the essential aspects of complicated processes 
while making the problem analytically tractable. 

The following theorem presents the long-run mean average cost of deadlock handling in connec- 
tion with the rate of deadlock formation and the frequency of deadlock detection. 

Theorem 1 Suppose deadlock formation follows a Poisson process with rate A. The long-run mean 
average cost of deadlock handling, denoted by C(T), is 

where the frequency of deadlock detection scheduling is 1/T. ▲ 

Proof: Let {Xi,i > 1} be the interarrival times of independent deadlock formations, where random 
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variables Xi,i > 1 are independent and exponentially distributed with mean 1/A. Define S$ = 

n 

and S n = ^2 Xi, where S n represents the time instant at which the nth independent deadlock 

i=l 

occurs. 

Let N(t) = supjra : S n < i] represent the number of deadlock occurrences within the time 
interval (0, t]. The long-run mean average cost is 

,. EYrandom cost in (0, tl) 

hm , (2) 

t— too t 

where E is the expectation function. In order to associate this cost with the deadlock detection 
frequency (1/T), we partition the time interval (0, t] into non-overlapping subintervals of length T. 
Let £k(T) be the cost of deadlock handling on the subinterval ({k — l)T,kT], k > 0. £fc(T) is a 
random variable. According to the stationary and independent increments of Poisson process [25J, 
E(£i(T)) = E(£j(T)), i / j. The long-run mean average cost becomes 

i?(random cost in (0,t\) ^ fc ? ^ 



C(T) = lim — = lim 

t— >oo t t— >oo t 

= lim mihw = sip, (3) 

t— too t T 

where [^J is the floor function in x. 

The cost £(T) on interval (0,T] is the sum of a deadlock detection cost Cd and a deadlock 
resolution cost for those deadlocks independently formed within the interval (0, T]. For the ith 
deadlock formed at time S{ < T, the resolution cost Cr{T — Si) is a function of the deadlock 
persistence time T — Si. Hence, the accrued total cost over (0,T] is 

N(T) 

e(T) = C D + C dT - Si)I {N{T)>0} , (4) 

i=l 

where Iq is the indicator function whose value is 1 (or 0) if predicate 9 is true (or false). Among 
that, the deadlock resolution cost on interval (0, T] is 

N(T) oo 

Cr(T - S'i)J{ J v(T)>0} = ^2 Cr ( T ~ S i) I {S,<T} ( 5 ) 
i=l i=l 
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1 

E (C R (T - Si) 

-^{5j<T}) — / Cr(T - t)fi(t)dt (6) 



where fi(t) is the probability density function of Si which follows the gamma distribution given 
below: 



Substituting Eq© into Eq© gives rise to 

E(C R (T-Si)I {Si < T} ) = f T C R (T -t) T ^—t i - 1 e- Xt dt. 

Jo V* ~~ *-)■ 

The expected total resolution cost over the time interval (0, T] is 



(8) 



E{ £ C R (T - Si)I {N{T)>0} ) = £ / C^T - t)-^_ e -^dt 
i=i i=i- 7 ° 1 J ' 

= jT C fl (T - t)Ae" At \J2 fzi^j dt = X [ Cr( - T ~ t)dt = X [ C R^ dL 



(9) 



Combining Eqs©, ©, and © yields 



C {T) = mi{T)) = 9° + A /o C R (T-t)dt = Cd + \f T C R (t)dt ^ (iq) 

Theorem [1] is thus established. ■ 
Theorem [1] is mainly concerned with the impact of deadlock detection frequency and deadlock 
formation rate on the long-run mean average cost of overall deadlock handling. It is independent 
of the choice of deadlock detection/resolution algorithms. The following corollary is an immediate 
consequence of Theorem [TJ 

Corollary 1 The long-run mean average cost of deadlock handling is proportional to the rate of 
deadlock formation A. ▲ 

Proof: the proof is straightforward and thus omitted. ■ 
Theorem [T] and Corollary [1] state that the overall cost of deadlock handling is closely associated 
not only with per-deadlock detection cost, and aggregated resolution cost, but also with the rate 
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of deadlock formation, A. In the following lemma, we will show the existence and uniqueness of 
asymptotic optimal frequency of deadlock detection when deadlock resolution is more expensive 
than a deadlock detection in terms of message complexity. 

Lemma 1 Suppose that the message complexity of deadlock detection is (D(n a ), and that of deadlock 
resolution is (^(n 13 ). If a < (3, there exists a unique deadlock detection frequency 1/T* that yields 
the minimum long-run mean average cost when n is sufficiently large. ▲ 

Proof: Differentiating Eq([T]) yields 

Define a function tp(T) as follows 

<p(T) = T 2 C'(T) = -C D + XTC R (T) - A / C R (t)dt. (12) 

J o 

Notice that C'{T) and <p(T) share the same sign. Differentiating <p(T), we have 

= XTC' R (T) (13) 

Because Cr{T) is a monotonically increasing function, C' R (T) > 0, which means (p'(T) > 0. 
Therefore, <p'(T) is also a monotonically increasing function. Cr{T) — C R {t) > holds iff T > t. 
For any given < £ < T, it has 

<< 



TC R (T) - [ C R (t)dt = [ (C R (T) - C R (t))dt > [ (C R (T) - C R (t))dt 
Jo Jo Jo 

> I (C R (T)-C R (0)dt = t(C R (T)-C R (0). (14) 
Jo 



Applying EqdMD to Eq (|T2|) . we have 

<p(T) = -C D + X(TC R (T) - [ T C R {t)dt) > -C D + \H(C R (T) - C R (0) (15) 

Jo 
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We further have 

<p(T) > -C D + \ZC R (T)(1 - = -C D + \£C R (T)9 (16) 

where 9 = (1 — C R (£)/C R (T)) and < 9 < 1 since C R (T) is monotonically increasing. Substituting 
Cd = c\n a and C R (oo) = in Eq([I5|). we obtain 

lim tp(T) > -cm a + \£9c 2 nP (17) 

T— i>oo 

Since a < j3, lim (p(T) is asymptotically dominated by the term \^9c2n^ when n is sufficiently 

T— ¥00 

large. Observe that </?(0) = — Cd < 0, and <f>(T) is monotonically increasing. By the intermediate 
value theorem, it must be true that there exists a unique T*, < T* < 00, such that 



ip(T) = T 2 C'(T) = < 



' < 0, < T < T* 
= 0, T = T* 
>0, T > T*. 



It means that C (T) reaches its minimum at and only at T = T* . The existence and the uniqueness 



of optimal deadlock detection interval T* = arg min C(T) is proved. ■ 

Vt>o / 

To make the idea behind this derivation concrete, we apply the up-to-date results of deadlock 
detection/resolution algorithms. As discussed before, the best-known message complexity of a 
distributed deadlock detection algorithm is 2n 2 |14| when it is written as a polynomial of n. The 
best-known message complexity of a deadlock resolution algorithm is 0{nn 2 D ) [7j. Therefore, Cd = 
n 2 , and Cn(t) = cnn D (t), where c is a positive constant. Because the deadlock size noif) is always 
bounded by n, from (fT5"j) we have 

(p(oo) = lim tp(T) > -C D + \£(C R (oo) - C R (£)) « -2n 2 + Ac^n 3 . (18) 

T— >oo 

Note that ^ is a fixed value that can be arbitrarily chosen. For a sufficiently large n, Eq (|18p becomes 

<^(oo) rj Ac£n 3 > (19) 
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<p(Q) = —Cd = — 2n 2 . Because (p(T) is monotonically increasing, there exists an optimal deadlock 
detection frequency 1/T* such that (p(T*) and thus C'(T*) are zero, which minimizes the long-run 
mean average cost C{T) for deadlock handling. 

The motivation behind the proof is that the cost per deadlock detection is fixed when the 
total number of processes in the distributed system is given, while the cost of deadlock resolution 
monotonically increases with deadlock persistence time. The resolution cost will eventually outgrow 
the detection cost if deadlocks persist. As we set the time interval T between any two consecutive 
detections longer, the detection cost becomes smaller due to less frequent executions of the detection 
algorithm, but the resolution cost becomes larger due to the growth in deadlock size. This implies 
that there exists a unique deadlock detection frequency 1/T* that balances the two costs such that 
their sum is minimized. The condition that the asymptotic deadlock resolution cost, Cr(oo), is 
greater than the cost of deadlock detection, Cd, constitutes the natural mathematical basis to 
justify distributed deadlock detection algorithms. 

We are now ready to state the asymptotically optimal frequency for deadlock detection based 
on the up-to-date results of distributed deadlock detection and resolution algorithms. Recall that 
the best-known message complexity for distributed deadlock detection algorithms is 2n 2 jTl] an d 
that for deadlock resolution algorithms of ©(nn 2 -,) [7j. 

Theorem 2 Suppose the message complexity for distributed deadlock detection is 2n 2 , and that 
for distributed deadlock resolution is 0(nn 2 D {t)). Then the asymptotically optimal frequency for 
scheduling deadlock detections is ©((An) 1 / 3 ). ▲ 

Proof: Assume that the deadlock size function nr>{t) is both differentiable and integrableo Then 



Recall that no(t) is a continuous approximation function whose curves between "jumping points" can be chosen. 
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ri£)(t) can be expressed in the form of Maclaurin series as follows: 

°° ng } (0)f x 



M*)=E ! ^F 1 =E^ (20) 

i=0 ' i=0 



where (0) denote the ith derivative of the deadlock size function ri£,{t) at point zero and q = 
r#(0)/i!. 

By the properties of the deadlock size function n£>(t), we have no(0) = and n' D (0) > 0. It 

can be easily verified that cq = and c\ = n' D (0) > 0. The resolution cost Cn(t) can be written as 

cnn 2 D (t) for some constant c. By Theorem [IJ the long-run mean average cost becomes 

, x 2n 2 Cn 2 n (t)dt , 

C(T) = — + Xcn Jo g W . (21) 



Inserting Eq (l20p into Eq (j21j) . we have 

(>Jc i f) 2 ( it = — + — 

'0 



J- /n J J 

170 z=l 



(22) 



Through a lengthy calculation, Eq (|22|) can be simplified as 



ceo - ¥ + ^ + ^> + a " s (E £ ffO- <»> 

i=2 j=2 ^ J " r 

Taking derivative of Eq (l23p with respect to T, we have 

cm = _£ + cA„3(c;f + « + «*„.<£ £ ^+^ J " ). (2 4) 

By lemmaHJ there exists a unique optimal detection frequency 1/T* when n is sufficiently large, 

such that C(T*) < C(T), T G (0,oo). We know that C'(T*) = 0. Based on (jMJ), we transform 

C'(T*) = to the following equation. 

1 = cA 2cf(T*) 3 + 3c lC2 (r*) 4 + ~ ~ c^+jKT^W 
n2 3 2 ~^~t, i + j + 1 

Only n, T*, and A are free variables and the rest are constants. By performing the Big-0 reduction 
we obtain 

I = 9(A((T*) 3 + (T*) 4 + (T*f + ...)) (26) 
n 
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When n is sufficiently large and T* is sufficiently small, we have 



1 

n 1 — 1* 



0(\(T 



t* = n(- 



1 



r) 



Therefore, the asymptotic optimal deadlock detection frequency l/T* is ©((An) 1 / 3 ). 



(27) 
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Figure 2: Cost of Deadlock Handling vs. Detection Interval (n: number of processes) 
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Figure 3: Cost of Deadlock Handling vs. Deadlock Formation Rate A 



As an illustration, we consider an example as follows. Let Cn(t) = n 3 (l — exp(— t)), Co = n 2 . 
In accordance with Theorem 1, the long-run mean average cost of deadlock handling thus is written 
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_ n 2 + An 3 (T + exp(-T) - 1) 



Figs ([2]) -([3]) show log-log plots of a family of curves illustrating the dependence of long-run mean 
average cost of deadlock handling upon detection interval. The x-axis denotes the deadlock detection 
interval and the y-axis denotes the long-run mean average cost of deadlock handling. 



# of Processes 


Optimal Detection Interval (A = 1) 


50 


0.214699(s) 


100 


0.148555(s) 


200 


0.103495(s) 


500 


0.064189(s) 


1000 


0.045402(s) 


# of Processes 


Optimal Detection Interval (A = 1/30) 


50 


2.0223(s) 


100 


1.0973(b) 


200 


0.6832(s) 


500 


0.3942(s) 


1000 


0.2675(s) 



Table 1: Optimal Detection Interval vs. # of Processes 



In Fig((2]) , we present plots of the deadlock detection interval and cost of deadlock handling under 
different the total number of processes, 50, 100, 200, 500, and 1000, respectively. FigQ shows the 
relationship between the overall cost of deadlock handling and deadlock detection interval under 
the different deadlock formation rates, Is, l/30s, l/60s, l/90s, and l/120s, respectively. Figs ([2]) -([3]) 
visualizes convexity that suggests the existence of an optimal detection frequency, illustrating that 
the overall cost of deadlock handling increases with the total number of processes and deadlock 
formation rate. 

A detailed calculation given in Table 1 shows that as the number of processes in a distributed sys- 
tem increases, the optimal detection interval decreases, which is clearly in line with our theoretical 
analysis. In the sequel, we study the impact of coordinated vs. random deadlock detection schedul- 
ing on the performance of deadlock handling. We consider two strategies of deadlock detection 
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scheduling: (1) centralized, coordinated deadlock detection scheduling, and (2) fully distributed, 
uncoordinated deadlock detection scheduling. 

The centralized scheduling excels in its simplicity in implementation and system maintenance, 
but undermines the reliability and resilience against failures because one and only one process 
is elected as the initiator of deadlock detections in a distributed system. In contrast, the fully 
distributed scheduling excels in the reliability and resilience against failures because every process 
in the distributed system can independently initiate detections [TB], without a single point of 
failure. However, due to the lack of coordination in deadlock detection initiation among processes, 
it presents a different mathematical problem from the centralized deadlock detection scheduling. 

In the previous discussions we have focused on the derivation of optimal frequency of deadlock 
detection in connection with the rate of deadlock formation and the message complexities of dead- 
lock detection and resolution algorithms, assuming deadlock detections are centrally scheduled at a 
fixed rate of 1/T. To capture the lack of coordination in fully distributed scheduling, we will study 
the case where processes randomly, independently initiate the detection of deadlocks. 

Let n be the number of processes in a distributed system and T be the optimal time interval 
between any two consecutive deadlock detections in the centralized scheduling. Consider a fully 
distributed deadlock detection scheduling, where each process initiates deadlock detection at a rate 
of l/(nT) independently. Although the average interval between deadlock detections in the fully 
distributed scheduling remains T (the same as its centralized counterpart), the actual occurring 
times of those detections are likely to be non-uniformly spaced because the initiation of deadlock 
detection is performed by the processes in a completely uncoordinated fashion. 

In the following we will study the fully distributed (random) scheduling and compare it with the 
centralized scheduling. Consider a sequence of independently and identically distributed iid random 
variables {Yi,i > 1} defined on (0, oo) following certain distribution H. The sequence {Yi,i > 1} 
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represents the inter-arrival times of deadlock detections initiated by the fully distributed scheduling, 
and it is assumed to be independent of the arrival of deadlock formations. It is obvious that the 
centralized scheduling is a special case of the fully distributed scheduling. 

Let T~L be the family of all distribution functions on (0, oo) with finite first moment. Namely, 

U = \h\ H is a CDF on (0,oo), J H(t)dt < ooj 
where H(t) = 1 - H(t), Vt > 0. 



(29) 



The following theorem states that the lack of coordination in deadlock detection initiation by 
fully distributed scheduling will introduce additional overhead in deadlock handling. Therefore the 
fully distributed scheduling in general cannot perform as efficiently as its centralized counterpart. 

Theorem 3 Let Ch denote the long-run mean average cost under fully distributed scheduling with 
a random detection interval Y characterized by certain distribution H 6 T-L with the mean of fj,, 
and C{T) denote the long-run mean average cost under centralized scheduling with a fixed detection 
interval T . Then 

C H > C(T), (30) 
when E(Y) = fi = T. ▲ 

Proof: Since the sequence {Yi,i > 1} of interarrival times of deadlock detection is assumed to be 

independent of the Poisson deadlock formations, it is easy to see that the random costs over the 

intervals (0, Yi], (Y±, Y\ + I2], • • • are iid. Using the same line of reasoning in the proof of Theorem[TJ 

the long-run mean average cost is expressed as 

.E (random cost over Y) 
Ch = ^ , (31) 

where Y £ % is a random variable representing the interval between two consecutive deadlock 
detections. Let £(Y) be the random cost in the interval Y. The expected cost over the interval Y 
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is given by 

N(y) 



poo V" 

EMY)) = E{E[£(Y)\Y]} = / E{C D + Y, C n(y ~ S n )I {N{y)>0} )dH(y), 

J o „=i 



(32) 



where S n = ^ X{ denotes the time of the nth deadlock formation and N(y) represents the number 
i=i 

of independent deadlocks occurred in the time interval (0, y). It follows from the independence of 
{Xi,i > 1} and {Yi,i > 1}, and from Eq (|32p . the long-run mean average cost is 

_ my)) _ f Q °°(C D + H \C R (t)dt)dH(y) _ C D , f™(f t °°\C R (t)dH(y))dt 
H E(Y) E(Y) E(Y) E(Y) 

C D \$™C R {t)H{t)dt 



E(Y) E(Y) 

When E(Y) = fi = T, meaning that the fixed deadlock detection interval T equals to the mean 
value of the random detection interval Y, we compare the centralized (fixed) detection scheduling 
with the rate of 1/T with the fully distributed (random) one with the mean rate of 1/E(Y) = 
According to Theorem [IJ the long run mean average cost of fixed detection is given as 

, N C D A [>?C R (t)dt , 
C(T ) = — + ^ — ^J__ 34 

Subtracting Eq ([M|) from Eq ([33|) yields 

C H - C{T) = ± |^°° C R (t)H(t)dt - £ C R (t)dt} = £ |^°° C R (t)H(t)dt - jT C R (t)H(t)dt} 

> ± f°° H{t)dt - C R ifi) jT H(t)dt\ = | y°° H(t)dt - f\l - H(t))dt 

= \C R (y) f f°° 
V [Jo 

Hence we have 



H(t)dt - n \ = 0. (35) 



C H > C(T). (36) 



Theorem [3] is thus established. 
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It can be seen from Eq (|36p that Ch > C(T) and the equality holds if and only if Y is a degenerate 
random variable when ProbiY = T) = 1. Theorem [3] asserts that the fully distributed (ran- 
dom) deadlock detection scheduling in general results in an increased overhead in overall deadlock 
handling. 

5 Conclusion 

Deadlock detection scheduling is an important, yet often overlooked aspect of distributed deadlock 
detection and resolution. The performance of deadlock handling not only depends upon per- 
execution complexity of deadlock detection/resolution algorithms, but also depends fundamentally 
upon deadlock detection scheduling and the rate of deadlock formation. Excessive initiation of 
deadlock detection results in an increased number of message exchange in the absence of deadlocks, 
while insufficient initiation of deadlock detection incurs an increased cost of deadlock resolution 
in the presence of deadlocks. As a result, reducing the per-execution cost of distributed deadlock 
detection/resolution algorithms alone does not warrant the overall performance improvement on 
deadlock handling. 

The main thrust of this paper is to bring an awareness to the problem of deadlock detection 
scheduling and its impact on the overall performance of deadlock handling. The key element in 
our approach is to develop a time-dependent model that associates the deadlock resolution cost 
with the deadlock persistence time. It assists the study of time-dependent deadlock resolution 
cost in connection with the rate of deadlock formation and the frequency of deadlock detection 
initiation, differing significantly from the past research that focuses on minimizing per-detection 
and per-resolution costs. 

Our stochastic analysis, which solidifies the ideas presented in [101 EH El) shows that 
there exists a unique deadlock detection frequency that guarantees a minimum long-run mean 
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average cost for deadlock handling when the total number of processes in a distributed system is 
sufficiently large, and that the cost of overall deadlock handling grows linearly with the rate of 
deadlock formation. 

In addition, we study the fully distributed (random) deadlock detection scheduling and its 
impact on the performance of deadlock handling. We prove that in general the lack of coordination 
in deadlock detection initiation among processes will increase the overall cost of deadlock handling. 

Theoretical results obtained in this paper could help system designers/practitioners to better 
understand the fundamental performance tradeoff between deadlock detection and deadlock resolu- 
tion costs, as well as the innate dependency of optimal detection frequency upon deadlock formation 
rate. However, there are still a lot of questions regarding how to use theoretical results to fine-tune 
the performance of a distributed system. Determination of the actual rate of deadlock formation 
and verification of the Poisson process are problems of great complexity that can be influenced by 
many known/unknown factors such as the granularity of locking, actual distribution of resource, 
process mix, and resource request and release patterns [26]. Tapping into system logging files and 
inferring the actual deadlock formation rate via data mining could provide an effective and feasible 
way to translate theoretical insights into actual system performance gain. 
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